Low latency and tight resources viseme recognition from speech using an artificial neural network
نویسندگان
چکیده
We present a speech driven real-time viseme recognition system based on Artificial Neural Network (ANN). A Multi-Layer Perceptron (MLP) is used to provide a light and responsive framework, adapted to the final application (i.e., the animation of the lips of an avatar on multi-task platforms with embedded resources and latency constraints). Several improvements of this system are studied such as data selection, network size, training set size, or choice of the best acoustic unit to recognize. All variants are compared to a baseline system, and the combined improvements achieve a recognition rate of 64.3% for a set of 18 visemes and 70.8% for 9 visemes. We then propose a tradeoff system between the recognition performance, the resource requirements and the latency constraints. A scalable method is also described. Key-words: Speech Processing, Lip Animation, Visemes, Artificial Neural Network, Computational Cost. 1 IRISA/Université de Rennes 1 – [email protected] 2 IRISA/CNRS UMR 6074 – [email protected] ha l-0 08 48 62 9, v er si on 1 26 J ul 2 01 3
منابع مشابه
Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملLIQUEFACTION POTENTIAL ASSESSMENT USING MULTILAYER ARTIFICIAL NEURAL NETWORK
In this study, a low-cost, rapid and qualitative evaluation procedure is presented using dynamic pattern recognition analysis to assess liquefaction potential which is useful in the planning, zoning, general hazard assessment, and delineation of areas, Dynamic pattern recognition using neural networks is generally considered to be an effective tool for assessing of hazard potential on the b...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملVoiceless Speech Recognition Using Dynamic Visual Speech Features
This paper describes a voiceless speech recognition technique that utilizes dynamic visual features to represent the facial movements during phonation. The dynamic features extracted from the mouth video are used to classify utterances without using the acoustic data. The audio signals of consonants are more confusing than vowels and the facial movements involved in pronunciation of consonants ...
متن کامل